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REGULARIZED DISTRIBUTIONS AND ENTROPIC 
STABILITY OF CRAMER’S CHARACTERIZATION 
OF THE NORMAL LAW 

S. G. BOBKOV^’"*, G. P. CHISTYAKOV^’**, AND F. GOTZE®’** 


Abstract. For regularized distributions we establish stability of the characterization of the 
normal law in Gramer’s theorem with respect to the total variation norm and the entropic 
distance. As part of the argument, Sapogov-type theorems are refined for random variables 
with finite second moment. 


1. Introduction 

Let X and Y be independent random variables. A theorem of Cramer [Cr] indicates 
that, if the sum X -\-Y has a normal distribution, then both X and Y are normal. P. Levy 
established stability of this characterization property with respect to the Levy distance, which 
is formulated as follows. Given e > 0 and distribution functions F, G, 

L{F*G,^)<e => L{F,^a,,a^) <Se, L{G,^a,,a,) <Ss, 

for some ai, 02 G R and ui, (72 > 0, where 5e only depends on s, and in a such way that ^ 0 
as e ^ 0. Here ^a,(j stands for the distribution function of the normal law N{a, with mean 
a and standard deviation a, i.e., with density 

(x) = , X G R, 

cJV2vr 

and we omit indices in the standard case a = 0, cr = 1. As usual, F*G denotes the convolution 
of the corresponding distributions. 

The problem of quantitative versions of this stability property of the normal law has been 
intensively studied in many papers, starting with results by Sapogov [Sl-3] and ending with 
results by Chistyakov and Golinskii [C-G], who found the correct asymptotic of the best 
possible error function e ^ for the Levy distance. See also [Zl], [M], [L-0], [C], [Se] [Shl-2]. 

As for stronger metrics, not much is known up to now. According to McKean ([MC], cf. 
also [C-S] for some related aspects of the problem), it was Kac who raised the question about 
the stability in Cramer’s theorem with respect to the entropic distance to normality. Let us 
recall that, if a random variable X with finite second moment has a density p{x), its entropy 

/ OO 

p{x) logp(x) dx 

-OO 
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is well-defined and is bounded from above by the entropy of the normal random variable Z, 
having the same variance = Var(Z) = Var(X). The entropic distance to the normal is 
given by the formula 

D{X) = h{Z) - h{X) = r p{x) log dx, 

J—oo ^a,a{x) 

where in the last formula it is assumed that a = EZ = EX. It represents the Kullback-Leibler 
distance from the distribution E of X to the family of all normal laws on the line. 

In general, 0 < E(X) < oo, and an infinite value is possible. This quantity does not depend 
on the variance of X and is stronger than the total variation distance ||E — ‘ha,( 7 ||TV) as may 
be seen from the Pinsker (Pinsker-Csiszar-Kullback) inequality 

E(X) > ^WF-^aATV 

Thus, Kac’s question is whether one can bound the entropic distance D(X + Y) from below 
in terms of D{X) and D(Y) for independent random variables, i.e., to have an inequality 

D{X + Y) > a{D{X),D{Y)) 

with some non-negative function a, such that a{t, s) > 0 for t,s > 0. If so, Cramer’s theorem 
would be an immediate consequence of this. Note that the reverse inequality does exist, and 
in case Var(X -I- T) = 1 we have 

D{X + Y)< Xai{X)D{X) + Var(y)T)(y), 

which is due to the general entropy power inequality, cf. [D-C-T]. 

It turned out that Kac’s question has a negative solution. More precisely, for any £ > 0, one 
can construct independent random variables X and Y with absolutely continuous symmetric 
distributions F, G, and with Var(X) = Var(y) = 1, such that 

a) D{X + Y) <e ; 

b) ||E — ^a,o-||TV > c and ||G — <ha,o-||TV > c, for all a G R and cr > 0, 

where c > 0 is an absolute constant, see [B-C-Gl]. In particular, D{X) and D(Y) are bounded 
away from zero. Moreover, rehned analytic tools show that the random variables may be chosen 
to be identically distributed, i.e., a) — b) hold with F = G, see [B-C-G2]. 

Nevertheless, Kac’s problem remains to be of interest for subclasses of probability measures 
obtained by convolution with a “smooth” distribution. The main purpose of this note is to 
give an affirmative solution to the problem in the (rather typical) situation, when independent 
Gaussian noise is added to the given random variables. That is, for a small parameter a > 0, 
we consider the regularized random variables 

X^ = X + aZ, Y^ = Y + aZ, 

where Z denotes a standard normal random variable, independent of X, Y. As a main result, 
we prove: 

Theorem 1.1. Let X,Y be independent random variables with Var(X -|- K) = 1. Given 
0 < cj < 1, the regularized random variables X„ and Ya- satisfy 

where c > 0 is an absolute constant, and 

D = a^ (Var(X,) D{X^) + Var(y,) D{Y,)). 
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Thus, if D{Xa + To-) is small, the entropic distances D{X^) and DiYa) have to be small, as 
well. In particular, Cramer’s theorem is a consequence of this statement. However, it is not 
clear whether the above lower bound is optimal with respect to the couple {D{Xf^), D{Ya-)), 
and perhaps the logarithmic term in the exponent may be removed. As we will see, a certain 
improvement of the bound can be achieved, when X and Y have equal variances. 

Beyond the realm of results around P. Levy’s theorem, recently there has been renewed the 
interest in other related stability problems in different areas of Analysis and Geometry. One 
can mention, for example, the problems of sharpness of the Brunn-Minkowski and Sobolev-type 
inequalities (cf. [F-M-Pl-2], [Seg], [B-G-R-S]). 

We start with the description and refinement of Sapogov-type theorems about the normal 
approximation in Kolmogorov distance (Sections 2-3) and then turn to analogous results for 
the Levy distance (Section 4). A version of Theorem 1.1 for the total variation distance is given 
in Section 5. Sections 6-7 deal with the problem of bounding the tail function 1{|X|>T} ™ 
terms of the entropic distances D{X) and D{X -|-K), which is an essential part of Kac’s prob¬ 
lem. A first application, namely, to a variant of Chistyakov-Golinskii’s theorem, is discussed 
in Section 8. In Section 9, we develop several estimates connecting the entropic distance D{X) 
and the uniform deviation of the density p from the corresponding normal density. In Section 
10 an improved variant of Theorem 1.1 is derived in the case, where X and Y have equal 
variances. The general case is treated in Section 11. Finally, some relations between different 
distances in the space of probability distributions on the line are postponed to appendix. 

2. Sapogov-type theorems for Kolmogorov distance 

Throughout the paper we consider the following classical metrics in the space of probability 
distributions on the real line: 

1) The Kolmogorov or L°°-distance ||F — G|| = sup^, |F(x) — ^(x)!; 

2) The Levy distance 

L(F, Q) = min {/i > 0 : G{x — h) — h < F{x) < G{x + h) + h, Vx G R} ; 

3) The Kantorovich or L^-distance 

/ OO 

\F{x) - G{x)\dx] 

■OO 

4) The total variation distance 

||F - GIItv = sup |(F(xfc) - G(xfc)) - {F{yk) - G{yk))l 

where the sup is taken over all finite collections of points yi < xi < • • • < < Xn- 

In these relations, F and G are arbitrary distribution functions. Note that the quantity 
lFi(F, G) is finite, as long as both F and G have a finite first absolute moment. 

In the sequel, ^a,v or N{a,v‘^) denote the normal distribution (function) with parameters 
a G R u > 0. If a = 0, we write and write in the standard case a = 0, u = 1. 
Now, let X and Y be independent random variables with distribution functions F and G. 
Then the convolution F * G represents the distribution of the sum X + Y. If both random 
variables have mean zero and unit variances, Sapogov’s main stability result reads as follows; 

Theorem 2.1. Suppose that EX = EK = 0 and Var(X) = Var(K) = 1. If 

||F G - 4) * <h|| < e < 1, 
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then with some absolute constant C 

C 


< 


and 


log 7 


l|G-^||< 


C 


log 7 


In the general case (that is, when there are no finite moments), the conclusion is somewhat 
weaker. Namely, with e G (0,1), we associate 


/ N pl\ 

xdF{x), erf = / x^ dF{x) — a\ (fTi > 0), 

-N J-N 

and similarly ( 02 ,erf) for the distribution function G, where N = N{e) = 1 + y^21og(l/e). 
In the sequel, we also use the function 

r 1 6^ I 

m((T, e) = min< ^=,loglog—>, a > 0, 0<e<l. 

I Fa e ) 


N 


Theorem 2.2. Assume ||F * G — <I>|| < e < I. If F has median zero, and ai,a 2 > 0, then 
with some absolute constant C 

C 




O-lA/log^ 


:m{ai,e) 


and similarly for G. 


Originally, Sapogov derived a weaker bound in [Sal-2] with worse behaviour with respect 
to both ai and e. In [Sa3] he gave an improvement. 


\F - 


ai,(7il| < 


G 



with a correct asymptotic of the right-hand side with respect to e, cf. also [L-Oj. The correct¬ 
ness of the asymptotic with respect to e was studied in [M], cf. also [Cj. In 1976 Senatov [Sel], 
using the ridge property of characteristic functions, improved the factor af to a^ , i.e.. 


IF - $ 


dljO-lll ^ 



( 2 . 1 ) 


He also emphasized that the presence of a\ in the bound is essential. A further improvement 
of the power of ai is due to Shiganov [Shl-2]. Moreover, at the expense of an additional 

o /o 

e-dependent factor, one can replace a/ with ai. As shown in [C-G], see Remark on p. 2861, 


|F-$ 


ai.crill < 


Clog log ^ 


( 2 . 2 ) 


Therefore, Theorem 2.2 is just the combination of the two results, (2.1) and (2.2). 

Let us emphasize that all proofs of these theorems use the methods of the Complex Analysis. 
Moreover, up to now there is no ’’Real Analysis” proof of the Cramer theorem and of its 
extensions in the form of Sapogov-type results. This, however, does not concern the case of 
identically distributed summands, cf. [B-C-G2]. 

We will discuss the bounds in the Levy distance in the next sections. 

The assumption about the median in Theorem 2.2 may be weakened to the condition that 
the medians of X and Y, m{X) and miY), are bounded in absolute value by a constant. For 
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example, if EX = EX = 0 and Var(X + X) = 1, and if, for definiteness, Var(X) < 1/2, then, 
by Chebyshev’s inequality, |m(X)| < 1, while |m(X)| will be bounded by an absolute constant, 
when e is small enough, due to the main hypothesis \\F * G — <h|| < e. 

Moreover, if the variances of X and X are bounded away from zero, the statement of 
Theorem 2.2 holds with ai = 0, and the factor cji can be replaced with the standard deviation 
of X. In the next section, we recall some standard arguments in order to justify this conclusion 
and give a more general version of Theorem 2.2 involving variances: 


Theorem 2.3. Let EX = EX = 0, Var(X + X) = 1. If ||F * G — <h|| < £ < 1, then with 
some absolute constant C 


01/log} 

where v\ = Var(X), = Var(X) (ui,U 2 > 0). 




Under the stated assumptions, Theorem 2.3 is stronger than Theorem 2.2, since vi> ai. 
Another advantage of this formulation is that vi does not depend on e, while ai does. 


3. Proof of Theorem 2.3 


Let X and X be independent random variables with distribution functions F and G, re¬ 
spectively, with EX = EX = 0 and Var(X + X) = 1. We assume that 

||F * G - <h|| < e < 1, 

and keep the same notations as in Section 2. Recall that N = N{e) = 1 + y^ 2log(l/e). 

The proof of Theorem 2.3 is entirely based on Theorem 2.2. We will need: 


Lemma 3.1. With some absolute constant C we have 

0<l-{af + al)< CN^V^- 


A similar assertion, |cjf + cj| — 1| < CN'^e, is known under the assumption that F has a 
median at zero (without moment assumptions). For the proof of Lemma 3.1, we use arguments 
from [Sal] and [Sel], cf. Lemma 1. It will be convenient to divide the proof into several steps. 

Lemma 3.2. Let e < Eq = \ — <h(—1) = 0.0913... Then |m(X)| < 2 and |m(X)| < 2. 

Indeed, let Var(X) < 1/2. Then |m(X)| < 1, by Chebyshev’s inequality. Hence, 

^ < P{X < 1, X < m(X)} < P{X + X < m(X) + 1} < d>(m(X) + l)+e, 

which for e < | implies that miY) + 1 > — e). In particular, miY) > —2, if e < Sq. 

Similarly, m{Y) <2. □ 

To continue, introduce truncated random variables at level N. Put X* = X in case 
|X| < X, X* = 0 in case |X| > X, and similarly X* for X. Note that 

EX* = ai, Var(X*) = af, and EX* = aa, Var(X*) = cj|. 

By the construction, ai < vi and ua < ua- In particular, cjf + cj| < uf + = 1. Let F*, G* 

denote the distribution functions of X*,X*, respectively. 
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Lemma 3.3. With some absolute constant C we have 

\\F*-F\\<CV^, \\G*-G\\<C^/I, \\F* *G*<GVi. 


Proof. One may assume that = A^(e) is a point of continuity of both F and G. Since 
the Kolmogorov distance is bounded by 1, one may also assume that e is sufficiently small, 
e.g., e < min{eo,ei}, where Si = exp{—1/(3 — 2^/2)}. In this case {N — 2)^ > (A^ — 1)^/2, so 

c|,(_(iv - 2)) = 1 - <h(iV - 2) < i 1 e-(^-i)V4 = 

By Lemma 3.2 and the basic assumption on the convolution F * G, 

i P{F < -N} < F{X <2,Y < -N} 

< P{A: + Y < -{N - 2)} = {F* G){-{N - 2)) < $(-(Ai - 2)) + e. 

So, G{—N) < 2<h(—(Ai — 2)) + 2e < 3^/e. Analogously, 1 — G{N) < 3^/e. Thus, 

/ dG{x) < 6-v/e as well as / dF{x) < Qy/e. 

J{\x\>N} 

In particular, for x < —N, we have \F*{x) — F{x)\ = F{x) < 6\/e, and similarly for 

X > N. If —N < X < 0, then F*{x) = F{x) — F{—N), and if 0 < x < Ai, we have 

F*{x) = F{x) + (1 — F{N)). In both cases, |F*(x) — F{x)\ < 6^/e. Therefore, 

IIP* -P|| < 6Ve. 

Similarly, ||G* — G|| < 6\/e. From this, by the triangle inequality, 

||F* -F*G|| < ||F* -P* *G|| + ||F* *G-P*G|| 

< ||F* - F|| + ||G* - G|| < 12\/F. 

Finally, 

||F* - ^>11 < ||F* - F*G|| + ||F*G - ^>11 < 12 Ve + e < 13\/e. 


□ 

Proof of Lemma 3.1. Since | A*+y*| < 2^" and 01+02 = E (X*+y*) = J xdF**G*{x), 
we have, integrating by parts, 

r2N 


01+02 — 


'-2N 


Xd{{F* * G*){x) - ^x)) 


= x((F* *G*)(x)-$(x)) 


x=2N /.2N 


/ Aiy 

{{F* * G*){x) - <^{x)) dx. 

-2N 


lx=-2N J-2N 

Hence, |oi + 02 ! < 8A ||F* *G* — <h||, which, by Lemma 3.3, is bounded by GN^/e. Similarly, 

r2N 

2 

X u\\r ^ cx jyx) —'i^yx)) — ! X 

'-2N 

I x=2N 

{ ( 77'* , /O* > 


p2N r 

E{X* + Y*f-l = / x^ d{iF* * G*)ix) - <^{x)) - x^ d<^{x) 

J- 2 N /{|a;|>2Ar} 


= x^((F**G*)(x)-^>(x)) 


x=—2N 


f2Ar 


-2/ x{{F**G*){x)-^{x))dx- x^d^{x). 

J- 2 N /{|a;|>2Ar} 
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Hence, 

roo 

|E(X* +y *)2 - l| < 24iV2 ||F* - ^>11+2 / x'^d^ix). 

J 2 N 

The last integral asymptotically behaves like 2Nip{2N) < = Ne‘^. Therefore, 

|E (X* + y*)^ — l| is bounded by CX^^/e. Finally, writing erf+(T2 = E (X*+y*)^ —(01+02)^, 
we get that 

|cjf + di - l| < |E {X* + Y*f - l| + (ai + a2f < CN^^/e 
with some absolute constant C. Lemma 3.1 follows. □ 


Proof of Theorem 2.3. First note that, given a > 0, d > 0, and x G R, the function 


V^(x) = d>o,<7(a:) - 4'a,a(x) = $(-) - —-) 

G G 

is vanishing at infinity, has a unique extreme point xq = f, and = f-a/ 2 a 

Hence, including the case a < 0, as well, we get 


1^, 


a,(j 


-^0,a|| < 



We apply this estimate for a = ai and d = di. Since EX = 0 and Var(X + y) = 1, by 
Cauchy’s and Chebyshev’s inequalities, 

l“il = |E^ 1(|X|>»)I < P{l^l > < ^ < 


Hence, 



A similar inequality also holds for the parameters ( 02 , d 2 ). 

Now, define the non-negative numbers ui = vi — di, U 2 = V 2 — d 2 . By Lemma 3.1, 


CN'^y/e > l-(df-Fd|) = I - {{vi - Uif + {V 2 - U 2 f) 

= Ul { 2 vi - Ul) + U2 { 2 V 2 - U2) > U1V1+U2V2. 


Hence, 


CN^ 

vi 


and 


U2 < 


CN^ 

V2 


These relations can be used to estimate the Kolmogorov distance A = ||<l>o,„i — ^* 0 , 0 - 1 1|. 

Given two parameters a > /I > 0, consider the function of the form ■0(x) = <h(ax) — d>(/3x). 
In case x > 0, by the mean value theorem, for some xq G (/ 3 x, ax). 


V'(x) = {a — j3) x(p{xq) < {a — f3) x(p{f3x). 

Here, the right-hand side is maximized for x = b which gives ib(x) < } A similar 

bound also holds for x < 0. Using this bound with a = 1/di (di > 0), /3 = 1/xi, we obtain 

A< ' .Ul-l) ' !1 i<GLV!<LX^, 

Y2t^ Vdi vij Y2 t^ di dixi df 
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Thus, applying Theorem 2.2, we get with some universal constant C > 1 that 


IF - 


0,-yill ^ 


< 


1-^ *^ai,(7il| T $0,0-11 


c 


< 


o-iy^ogi 
2C 


■.m{ai,e) + 


C 


+ 


+ ll‘^*0,oi — I 


0-1 A/log 7 


m{ai,e) + 




cr. 


crt 


(3.1) 


The obtained estimate remains valid when cji = 0, as well. On the other hand, (Ji = vi — ui > 
ui — >\vi where the last inequality is fulfilled for the range ui > v(e) = VC N (4e)^/^. 

Hence, from (3.1) and using m{ai,e) < 2m{vi,e), for this range 


8Cm{vi,e) 4:CN‘^y/e 


Here, since m{vi,e) > 1, the first term on the right-hand side majorizes the second one, if 


> h(e) = N'^ 



Therefore, when vi > w{e) = max{u(e), i;(e)}, with some absolute constant C we have 


F - 


C'm{vi,e) 


Thus, we arrive at the desired inequality for the range vi > w{£). But the function w behaves 
almost polynomially near zero and admits, for example, a bound of the form w{£) < V C 
0 < e < eo, with some universal eo G (0,1), C > 1. So, when vi < rc(e), 0 < e < eo, we have 

1 > 1 > 1 

^'i-^/log^ rc(e)Y^logi £^I^^JC'' log i 

Here, the last expression is greated than 1, as long as e is sufficiently small, say, for all 
0 < e < £ 1 , where £i is determined by (C'",eo). Hence, for all such e, we have a better bound 

||F-$o,«ill < -^=- 

It remains to increase the constant C in order to involve the remaining values of e. A 
similar conclusion is true for the distribution G. Theorem 2.3 is thus proved completely. □ 


4. Stability in Cramer’s theorem for the Levy distance 

Let X and Y be independent random variables with distribution functions F and G. It 
turns out that in the bound of Theorem 2.2, the parameter cji can be completely removed, if 
we consider the stability problem for the Levy distance. More precisely, the following theorem 
was established in [C-G]. 
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Theorem 4.1. Assume that ||F * G — 4>|| < e < 1. If F has median zero, then with some 
absolute constant C 

(log log 1)2 


< G- 


logi 


Recall that 


rN 


ai = 


X dF{x), 


al = 


l-N 


pN 

J-N 


x^ dF{x) — af (fJi > 0), 


and similarly (a 2 ,ci' 2 ) for the distribution function G, where = 1 + y^2log(l/e). 

As we have already discussed, the assumption about the median may be relaxed to the 
condition that the median is bounded (by a universal constant). 

The first quantitative stability result for the Levy distance, namely, 

L{F,^a„a,) < G log-V8(i/e), 

was obtained in 1968 by Zolotarev [Zl], who applied his famous Berry-Esseen-type bound. The 
power 1/8 was later improved to 1/4 by Senatov [Sel] and even more by Shiganov [Shl-2]. 
The stated asymptotic in Theorem 4.1 is unimprovable, which was also shown in [C-G]. 

Note that in the assumption of Theorem 4.1, the Kolmogorov distance can be replaced with 
the Levy distance L{F, <h) in view of the general relations 

L{F, $) < ||F * G - $11 < (1 + M) L{F, $), M = ||$||Lip = 

V 27r 

However, in the conclusion such replacement cannot be done at the expense of a universal 
constant, since we only have 

< (1 + M)L(F,$ai,ai), M = \\^ai,ai\\up ^ 


IF- $ 


ai,cri I 


Now, our aim is to replace in Theorem 4.1 the parameters (ai,(Ti), which depend on e, 
with (0,ui) like in Theorem 2.3. That is, we have the following: 

Question. Assume that EA = EK = 0, Var(A + K) = 1, and L(F * G, $) < e < 1. Is it 
true that 

(log logf)2 


F(F,$,J < G 




with some absolute constant G, where vf = Var(A)? 

In a sense, it is the question on the closeness of ui to vi in the situation, where cii is small. 
Indeed, using the triangle inequality, one can write 


Here, the first term may be estimated according to Theorem 4.1. Eor the second one, we have 
a trivial uniform bound (over all ai), 

-L($ai,(71) $0,0-1) ^ l®l|) 

which follows from the definition of the Levy metric. In turn, the parameter oi admits the 
bound, which was already used in the proof of Theorem 2.3, namely, |ai| < —This bound 

behaves better than the one in Theorem 4.1, so we obtain: 
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Lemma 4.2. If EX = EE = 0, Var(X + Y) = 1, and L{F * G,^) < e < 1, then 

(log log 

yiogi 

Thus, we are reduced to estimating the distance L{^a-i, which in fact should be done 
in terms of vf — af. 

Lemma 4.3. Given v > a > 0, such that < 1, we have 

L{^a, < {v^ - O-^) log 2^2 - 

Proof. It will be clear that the asymptotic in terms of a = — a‘^ is correct. 

Since the normal distributions with mean zero are symmetric about the origin, the Levy 
distance L(<I>o-, represents an optimal value h>0, such that the inequality 

^a{x) < ^vix + h) + h (4.1) 

holds true for all x. (The other inequality, <hi,(x) < <ho-(x + h) + h, is equivalent to (4.1)). 
Moreover, for x < 0, we have ^cr(a;) < d*i;(a;), so only x > 0 should be taken into consideration. 
We may assume v > a, i.e., a > 0. Changing the variable x = ay, y > 0, (4.1) becomes 

d>(y) < +/i. (4.2) 

Here h needs to serve for all o" > 0, while a is fixed. So, we need to minimize the function 
V’(<7) = • By the direct differentiation, we find that 

= ( TI 2^3/2 = ^ a = ao = if{ao) = yjy‘^ + {h/aY > y. 

[a^ + n 

Since ^^'(0) > 0, we may conclude that V' is increasing for a < ao and is decreasing for a > ao. 
Hence, the inequality (4.2) will only be strengthened, if we replace it with 

d*(y) < inf 4 >(V’((t)) + h = min{<I>(V^(0)), <I>(V’(oo))} +/i = min |<h^—^, <I>(?/)| + 

That is, $(?/) < ‘h(^) + h, and since y > 0 is arbitrary, it is equivalent to 1 < 4*(^) + h. In 
other words, 

L{^a,^v) < L(4>o,4>a), 
where <I>o denotes the unit mass at the origin. 

Thus, we are reduced to the case a = 0. But then the Levy distance ho = L{^o, 4>q,) repre¬ 
sents the (unique) solution to the equation 1 = +h. To estimate it, we may use the bound 

1 ^ which gives 2ho < After the change ho = ay^2log(co/a), and 

using a < 1, we obtain 

1 > 2co^2 log^ > 2cov^2lo^, 

so, 4 cq logCQ < 1. It follows that Cq < 2 and ho < ay]og{2/cF), as was claimed. □ 

Remark. Attempts to derive bounds on the Levy distance L(<I>o-, <I>„) by virtue of standard 
general relations, such as Zolotarev’s Berry-Esseen-type estimate [Z2], lead to worse depen¬ 
dences of — cj^. For example, using a general relation L{F,Gy < Wi{F,G), cf. 
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Proposition A. 1.1, together with the Kantorovich-Rubinshtein theorem, we get that 

2 2 

<I^\aZ-vZ\<v-a= "" , 

V + a 

where Z ~ A^(0,1) and where we did not loose much when bounding VPi. This estimate has 
a disadvantage in comparison with Lemma 4.3 because of a possible small denominator. 

In view of Lemmas 4.2-4.3, in order to proceed, one needs to bound v\ — af in terms of e. 
However, this does not seem to be possible in general without stronger hypotheses. Note that 


2 2 
Ui - Ui = 


dF{x) + af. 


T > 0, 


/{N> 7 V} 

Hence, we need to deal with the quadratic tail function 

<5x(T)= [ x^dF{x), 

J{\x\>T} 

whose behavior at inhnity will play an important role in the sequel. 
Now, combining Lemmas 4.2 and 4.3, we obtain 

(log log 


<C 




+ R {dx{F^) + o-i) ) 


where R(t) = \Jt log(2/t). This function is non-negative and concave in the interval 0 < t < 2, 
with R(0) = 0. Hence, it is subadditive in the sense that R{^ + rf) < R{Cj + R{t]), for all 
> 0, ^ ?7 < 2. Hence, 

R{6x{N) + al) < R{5x{N)) + R{al) 

1/2 I y 


( 5x(iV)log 


6xiN) 


+ \ af log- 


,2 • 


As we have already noticed, |ai| < A = 
t —)■ tlog(e/f) is increasing in 0 < t < 1, 


■v/i^ 


= . In particular, |ai| < 1. Since the function 


1 


a; log 4 < < A^log-^ = 


1 -h log log ■ 


n "-1 

Taking the square root of the right-hand side, we obtain a function which can be majorized 
and absorbed by the bound given in Theorem 4.1. As a result, we have arrived at the following 
consequence of this theorem. 

Theorem 4.4. Assume independent random variables X andY have distribution functions 
F and G with mean zero and with Var(A -|- K) = 1. If L{F * G, <h) < e < 1, then with some 
absolute constant C 

(log logf)2 






+ v/<^x(A') log(2/,5x(A^)), 


where vi = Y^Var(A), Ai = 1 -|- Y^21og(l/e), and 6x{X) = (iF(x). 

It seems that in general it is not enough to know that Var(A) < 1 and L{F * G, <h) < e < 1, 
in order to judge the decay of the quadratic tail function 5x{T) as T —5- oo. So, some additional 
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properties should be involved. As we will see, the entropic distance perfectly suits this idea, 
so that one can start with the entropic assumption D{X + 1^) < e. 

5. Application of Sapogov-type results to Gaussian regularization 


In this section we consider the stability problem in Cramer’s theorem for the regularized 
distributions with respect to the total variation norm. As a basic tool, we use Theorem 2.3. 

Thus, let X and Y be independent random variables with distribution functions F and G, 
and with variances Var(A) = vf, Var(y) = ul {vi,V 2 >0, uf + ul = 1), so that X + Y has 
variance 1. What is not important (and is assumed for simplicity of notations, only), let both 
X and Y have mean zero. As we know from Theorem 2.3, the main stability result asserts 
that if \\F * G — <h|| < e < 1, then 




Cm{vi,e) 


for some absolute constant C. Here, as before 




Cm{v2,e) 


V2 




r 1 6^ i 

m(u, e) = min loglog—|, u > 0, 0 


< e < 1. 


On the other hand, such a statement - even in the case of equal variances - is no longer 
true for the total variation norm. So, it is natural to use the Gaussian regularizations 

X^ = X + aZ, Y^ = Y + aZ, 

where Z ~ A(0,1) is independent of X and Y, and where a is a (small) positive parameter. 
For dehniteness, we assume that 0 < o" < 1. Note that 

Var(Ao-) = vf + Var(yo-) = and Var(Ao- + Y„) = 1 + 2a‘^. 

Denote by F^ and Ga the distributions of X^ and Y ^, respectively. Assume X^ + Y^ is almost 
normal in the sense of the total variation norm and hence in the Kolmogorov distance, namely. 


1 


\F^ * G, - N{0 ,1 + 2a")II < - ||F, * G, - N{0, 1 + 2a 


I TV < £ < 1- 


Note that + Y^^ = (X + Y) + a\/2 Z represents the Gaussian regularization of the sum 
X + Y with parameter a\/2- One may also write Ao- + = A + (K + a\/2 Z), or equivalently, 

Y + aV2Z 


X^ + Y, 


= X' + Y' where A' = 


A 


Y' = 


\/l + 2a2 - ' ^ - Vl + 2a2 ’ VTT^ ’ 

Thus, we are in position to apply Theorem 2.3 to the distributions of the random variables X' 


and Y' with variances 


Using 1 + 2a^ < 3, it gives 


a 

ui = 


1 + 2a2 


and 


v'i = 


, ,, Gm{v\,e) 


+ 2a^ 

1 + 2a2 ■ 

3Gm(ui, e) 


NA/log? vijlog^ 

Now, we apply Proposition A.2.2 b) to the distributions F and G = with B = vi and get 

^ „i/2 ^ 4ui v^3Gm(ui,e) 


Fa - A(0, vf + a 


TV — 


a 




^ (log i) 1/4 
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One may simplify this bound by using vi\/m{vi,e) < and then we may conclude: 


Theorem 5.1. Let F and G he distribution functions with mean zero and variances 
respeetively, such that uf + ul = 1. Let 0 < ct < 1. // the regularized distributions satisfy 

1 


- ||F, * G, - N{0 ,1 + 2u2)||^^ < £ < 1, 


then with some absolute constant C 


- iV(0,u?+ fT^)||^^ < — , ||G^ - iV(0,ui+ fT^)||^^ < — 


1/4 


6. Control of tails and entropic Chebyshev-type inequality 


One of our further aims is to find an entropic version of the Sapogov stability theorem for 
regularized distributions. As part of the problem, we need to bound the quadratic tail function 

dx{T) = EA^ l{|x|>r} 

quantitatively in terms of the entropic distance D[X). Thus, assume a random variable X 
has mean zero and variance Var(A) = 1, with a finite distance to the standard normal law 

D{X) = h{Z) - h{X) = [ p{x) log ^ dx, 

J-oo ‘fix) 

where p is density of X and (p is the density of A(0,1). One can also write another represen¬ 
tation, DiX) = Ent..y(/), where / = ^, with respect to the standard Gaussian measure 7 on 
the real line. Let us recall that the entropy functional 


Ent^(/) = E^/ log / - E^/ log E^/ 

is well-defined for any measurable function / > 0 on an abstract probability space ( 11 ,/i), 
where E^ stands for the expectation (integral) with respect to pi. 

We are going to involve a variational formula for this functional (cf. e.g. [Le]): Eor all 
measurable functions / > 0 and g on H, such that Ent^(/) and E^ are finite, 

E^/ 5 <Ent^(/) + E^/logE^eT 

Applying it on 11 = R with = 7 and / = ^, we notice that E^/ = 1 and get that 

/ oo poo 

p{x)gix)dx < D{X) + log / e^^^^ ip{x) dx. 


Take gix) = ^x^ l{| 2 .|>'r} with a parameter a G (0,1). Then, 


/: 


e3^^^ipix)dx = 7 [-r,T] + 


'{\A>T} 


e^^ ip{x)dx 


r) /*CO 

= 7[-r,T] + ^ / = 7[-T,^] + ^^(l-<h(T^/^^)). 

V^TT Jt yi — a 


Using 7 [—T, T] < 1 and the inequality log(l + t)<t, we obtain that 

2 


/ OO 2 

e^^^^ ip{x) dx < (1 — <f> (T^/l — «)) 

-OO V 1 ^ 


Therefore, 


\/l — a 

IsxiT) < izi(A) + —1= 

2 a ay 1 — a 


(1 -d> [tVi^)) . 
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Now, we need to optimize the right-hand side over all a G (0,1). First, the standard bound 
1 — <l>(t) < (p{t)/t gives 


IsxiT) < -D{X) + 
I a 


2 1 
Tail - a) 




( 6 . 1 ) 


Choosing just a = 1/2, we get 

^Sx{T) < 2DiX)T^j=er^"l^ < 2Z1(X) + 2 


where the last bounds is fulfilled for T > For the remaining T the obtained inequality 

is fulhlled automatically, since then > 1, while ^5x{T) < ^ 'EX'^ = 

Thus, we have proved the following: 


Proposition 6.1. If X is a random variable with EX = 0 and Var(X) = 1, having density 
p{x), then for all T > 0, 

[ x‘^p{x)dx < 4D{X) + . 

J{\x\>T} 

In particular, the above integral does not exceed 8Z1(X) for T = 2Y^log^(l/Zl(X)). 

The choice a = 2/r^ in (6.1) would lead to a better asymptotic in T. Indeed, if T > 2, 
then Tail — a) > l/T, so 

IsxiT) < ^Zl(X) + ^e-'^^/2 < ^Il(X)+3re-^^/2. 

Hence, we also have: 

Proposition 6.2. If X is a random variable with EX = 0 and Var(X) = 1, having density 
pix), then for all T >2, 


X' 


l{\x\>T} 


'pix)dx < T‘^DiX) + QTe-^^/‘^. 


In the Gaussian case X = Z this gives an asymptotically correct bound for T —)■ oo (up 
to a factor). Note as well that in the non-Gaussian case, from Proposition 6.1 we obtain an 
entropic Chebyshev-type inequality 

p{|X|>2Vlog(l/Om)}< ,J”W^^ (D(X)<1). 

Finally, let us give a more flexible variant of Propositions 6.1 with an arbitrary variance 
= Var(X) iB > 0), but still with mean zero. Applying the obtained statements to the 
random variable X/B and replacing the variable T with T/B, we then get that 

[ x^pix)dx < 4DiX) + 4e~'^^. 

B J{\x\>T} 



Stability in Cramer’s theorem 


15 


7. Entropic control of tails for sums of independent summands 

We apply Proposition 6.1 in the following situation. Assume we have two independent 
random variables X and Y with mean zero, but perhaps with different variances Var(X) and 
Var(y). Assume they have densities. The question is; Can we bound the tail functions 5x and 
5y in terms of D{X + T), rather than in terms of D{X) and D(Y)? In case Var(X + T) = 1, 
by Proposition 6.1, applied to the sum X + Y, 

5x+y{T) = E (A + Yf l{|x+y|>r} < ^D{X + Y) + Ae-^"/\ (7.1) 

Hence, to answer the question, it would be sufficient to bound from below the tail functions 
5x+y in terms of 5x and 5y- 

Assume for a while that Var(A + Y) = 1/2. In particular, Var(y) < 1/2, and according to 
the usual Chebyshev’s inequality, PjT > -l}>i Hence, for all T > 0, 

E (A + y)^ 1{X+Y>T} > E (A + y)^ 1{X>T+1, Y>-1} 

> E (A - 1)^ Y>_i} > -E(A — 1)^1 {x>t+i}- 

If A > T+1 > 4, then clearly (A—1)^ > ^ A^, hence, E (A—1)^ l{x>T+i} ^ \ E A^ l{x>T+i}- 
With a similar bound for the range A < — (T + 1), we get 

5x+y{T)>-^8x{T + 1), r>3. (7.2) 

Now, change T + 1 with T (assuming that T > 4) and apply (7.1) to \/2 (A + y). Together 
with (7.2) it gives | <^+ 2 x(E) ^ 4:D(y/2 (A + y)) + . But the entropic distance to 

the normal is invariant under rescaling of coordinates, i.e., D{y/2 (A + y)) = D[X + Y). Since 
also <^+ 2 x(E) = 2(5x(T/\/2), we obtain that 

5x{T/V2) < 8T>(A + y) 

provided that T > 4. Simplifying by (valid for T > 4), and then replacing 

T with T^/2, we arrive at 

(5x(T) < 8T>(A + y) + 8e"^"/^, T > 4/\/2. 

Finally, to involve the values 0 < T < 4:1 y/2, just use < 8, so that the above inequality 
holds automatically for this range: Sx{T) < Var(A) < 1 < Moreover, in order to 

allow an arbitrary variance Var(A + y) = {B > 0), the above estimate should be applied 
to XjB^f^ and YjB^f^ with T replaced by TjB^f^. Then it takes the form 

^<5x(T) <8T(A + y)+8e-^'/8^'. 

We can summarize. 

Proposition 7.1. Let X and Y be independent random variables with mean zero and with 
Var(A + y) = (H > 0). Assume X has a density p. Then, for all T >0, 

[ x^p{x)dx < 16T)(A + y) + 16e“^^/®^^. 

B J{\x\>T} 
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8. Stability for Levy distance under entropic hypothesis 


Now we can return to the variant of the Chistyakov-Golinski result, as in Theorem 4.4. 
Let the independent random variables X and Y have mean zero, with Var(X + Y) = 1, and 
denote by F and G their distribution functions. Also assume X has a density p. In order to 
control the term 5x{X) in Theorem 4.4, we are going to impose the stronger condition 

D{X + T) < 2e. 

Using Pinsker’s inequality, this yields bounds for the total variation and Kolmogorov distances 

IIT^G — 4*11 < — IIT^G — 4*11 TV ^ 2 \/2D(A + Y) < -^/e = s'. 

Hence, the assumption of Theorem 4.4 is fulfilled, whenever e < 1. 

As for the conclusion, first apply Proposition 7.1 with B = 1, which gives 

Sx{T) = / x^p{x)dx < 16Z1(A + y) + < 16 e + 16 

J{\x\>T} 

In our situation, N = 1 +Y^21og(l/e') = 1 + -y/log(l/e), so, 5x{N) < 16e + 16e“'^^/® < 

Thus, we arrive at: 


Proposition 8.1. Let the independent random variables X and Y have mean zero, with 
Var(A + y) = 1, and assume that X has a density with distribution function F. If D{X+Y) < 
2e < 2, then 




(log log|)^ 


where vi = yXax{X) and C is an absolute constant. 


In general, in the conclusion one cannot replace the Levy distance L(T, 4>„^) with the 
entropic destance D{X). However, this is indeed possible for regularized distributions, as we 
will see in the next sections. 


9. Entropic distance and uniform deviation of densities 


Let X and Y be independent random variables with mean zero, finite variances, and assume 
X has a bounded density p. Our next aim is to estimate the entropic distance to the normal, 
D{X), in terms of D{X + Y) and the uniform deviation of p above the normal density 


A(A) = esssupa; {p{x) — (/9„(x)), 


where = Var(A) and (/?„ stands for the density of the normal law N{0,v‘^). 

For a while, assume that Var(A) = 1. Proposition A.3.2 gives the preliminary estimate 

+ i-5x(T), 


D{X) < A{X) + 2T+ 2T log(l +A{X)V^ 


^TV2 


involving the quadratic tail function Sx(T). In the general situation one cannot say anything 
definite about the decay of this function. However, it can be bounded in terms of D{X + Y) 
by virtue of Proposition 7.1: we know that, for all T > 0, 


1 


5x{T) < 8Il(A + y) + 8e-^"/®-®' 



Stability in Cramer’s theorem 


17 


where = Var(X + Y) = 1 + Var(y). So, combining the two estimates yields 
D{X) < + y)+ 


+ A 


\/^ + 2T + 2T log (l + A\/^e^ , where A = A(A). 


First assume A < 1 and apply the above with = 85^ log Then 85^ e A, 

and putting /? = AB^ — 1 > 3, we also have 

1//3 


log (l + = log + A =/31og + A ^ 

(27r)V2/3 


< /3 log 1 + 


A 


< /3 log ( 1 + — 


Collecting all the terms and using B > 1, we are lead to the estimate of the form 

D{X) < 8B‘^ D{X + y) + CB^ A log^/2 Z 

where C > 0 is an absolute constant. It holds also in case A > 1 in view of the logarithmic 
bound of Proposition A.3.1, 

L»(A) <log(l + A\/^) +^. 

Therefore, the obtained bound holds true without any restriction on A. 

Now, to relax the variance assumption, assume Var(A) = uf, Var(y) = {vi,V 2 > 0), 
and without loss of generality, let Var(A + y) = uf + = 1. Apply the above to A' = ^, 
Y' = Then, B^ = l/vf and A(A') = vi A(A), so with some absolute constant c > 0, 

cvl D{X) < D{X + y) + A(A) log3/2 (^2 + • 

As a result, we arrive at; 


Proposition 9.1. LetX,Y be independent random variables with mean zero, Var(A+y) = 
1, and such that X has a bounded density. Then, with some absolute constant c > 0, 

cVar(A) 11(A) < I1(A + A) + A(A) log^/^ ( 2 + 

V V Var(A) A(A) 

Replacing the role of A and Y, and adding the two inequalities, we also have as corollary; 


Proposition 9.2. Let A, Y be independent random variables with mean zero and positive 
variances vf = Var(A), ul = Var(y), such that uf + = 1, and both with densities. Then, 

with some absolute constant c > 0, 

c{vlD{X)+vlD{Y)) < D(A + y) + A(A)log3/2 (^2 ++ A(y) log^/^ (^ 2 +-^). 


This inequality may be viewed as the inverse to the general property of the entropic distance, 
which we mentioned before, namely, uf Z1(A) + ^1 L)(Y) > D{X + y), under the normalization 
assumption = 1. Let us also state separately Proposition 9.1 in the particular case of 

equal unit variances, keeping the explicit constant 8B^ = 16 in front of D[X + A). 
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Proposition 9.3. Let X, Y be independent random variables with mean zero and varianees 
Var(X) = Var(y) = 1, and such that X has a density. Then, with some absolute constant C 

D{X) < 16 D{X + Y) + C X{X) log3/2 (^2 + . 

One may simplify the right-hand side for small values of A(X) and get a slightly weaker 
inequality D{X) < 1QD{X + Y) + C'aA(X)“, 0 < a < 1, where the constants Ca depend 
on a, only. For large values of A(A), the above inequality holds, as well, in view of the 
logarithmic bound of Proposition of A.3.1. 


10. The case of equal variances 


We are prepared to derive an entropic variant of Sapogov-type stability theorem for regular¬ 
ized distributions. That is, we are going to estimate D{Xa-) and D{Yfj) in terms of D^X^j + Y^j) 
for two independent random variables X and Y with distribution functions F and G, by in¬ 
volving a small “smoothing” parameter cj > 0. It will not be important whether or not they 
have densities. Since it will not be important for the final statements, let X and Y have mean 
zero. Recall that, given u > 0, the regularized random variables are defined by X^j = X + aZ, 
Yfj = Y + aZ, where Z is independent of X and Y, and has a standard normal density (p. The 
distributions of Ag-, To- are denoted Fo-, Ga, with densities Pa^Qa- 

In this section, we consider the case of equal variances, say, Var(A) = Var(T) = 1. Put 

CJl = V'T+^, <72 = \/l -I- 20-2. 


Since Var(Ao-) = Var(yo-) = the corresponding entropic distances are given by 

D{X„) = h{aiZ) - h{X„) = f Pa{x) log dx, 

J-oo Taiix) 


and similarly for Y^-, where, as before, py represents the density of N{0,v‘^). Assume that 
D(Xa + Ya) is small in the sense that D{Xa + Ya) < 2e < 2. According to Pinsker’s inequality, 
this yields bounds for the total variation and Kolmogorov distances 

||Fo- * Go- - ‘h0-2II < - ||Fo- *Ga - ‘ho-allTV < \/e < 1- 

In the sequel, let 0 < a < 1. This guarantees that the ratio of variances of the components 
in the convolution F^j * G^ = F * {G * ) is bounded away from zero by an absolute 

constant, so that we can apply Theorem 2.3. Namely, it gives that ||F — <h|| < Glog~^^^(i), 
and similarly for G. (Note that raising e to any positive power does not change the above 
estimate.) Applying Proposition A.2.1 a), when one of the distributions is normal, we get 


A(A.) 


sup(p^(x) 

X 


TaAx)) < 


IF -$11 < 


G 


a 


a 


logy 


We are in position to apply Proposition 9.3 to the random variables X^ja\, Y^jJai. It gives 


D{X^) < 16F(A, + T,) + CA(A,)log3/2 (^2 + ^^) 


< 2,2e + G‘ 


^ log3/2 (2 + ^^!^) 


aJlogj 


where G' is an absolute constant. In the last expression the second term dominates the first 
one, and at this point, the assumption on the means may be removed. We arrive at: 
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Proposition 10.1. Let X, Y be independent random variables with mean zero and variance 
one. Given 0 < e < 1 and 0 < ct < 1, the regularized random variables X(j, satisfy 


D{X, + y,) < 2e ^ D{X,) + D{Y,) < C 


(2 + (T^J\og\ ) 


( 10 . 1 ) 


where C is an absolute constant. 


This statement may be formulated equivalently by solving the above inequality with respect 
to e. The function u{x) = i^^3/2f^2+x) increasing in x > 0, and, for any a > 0, u(x) < a ^ 

X < 8alog^/^(2 + a). Hence, assuming D{Xa + To-) < 1; we obtain from (10.1) that 

^ log3/2(2 + C/D) < ^ log3/2(2 + l/D) 
with some absolute constant C, where D = D{Xfj) + D{Yfj). As a result, 

D{X, + Y„) > exp { - C'^l°g°(^y/P) } 

Note also that this inequality is fulfilled automatically, if D(X^ + Yu) > 1. Thus, we get; 

Proposition 10.2. Let X,Y be independent random variables with Var(A) = Var(y) = 1. 
Given 0 < cr < 1, the regularized random variables Xu and Yu satisfy 

DiX., + U) > exp { - 

where D = D{Xu) + D{Yu) and C > 0 is an absolute constant. 

11. Proof of Theorem 1.1 

Now let us consider the case of arbitrary variances 

Var(X) = vf, Var(y) = {vi,V 2 > 0). 

For normalization reasons, let vf+v^ = 1. Then 

XaiiXu) = vj + a^, XariYu) = vi + a^, Var(X, + T,) = uf, 

where £72 = Vl + ‘2a‘^. As before, we assume that both X and Y have mean zero, although 
this will not be important for the hnal conclusion. 

Again, we start with the hypothesis D{Xu + Yu) <2e<2 and apply Pinsker’s inequality: 

1 


\Fu*Gu-^a,\\ < - \\Fu*Gu-^^ 


0 - 2 IITV y 


< Ve < 1. 


For 0 < £7 < 1, write Fu * Gu = F * (G* )• Now, the ratio of variances of the components 

in the convolution, may not be bounded away from zero, since vi is allowed to be small. 

Hence, the application of Theorem 2.3 will only give \\F — ‘hpjJI < ^nd similarly for 

G. The appearance of vi on the right is however not desirable. So, it is better to involve the 
Levy distance, which is more appropriate in such a situation. Consider the random variables 

^ + £7^2^ 


A' = 


A 


\/l + 2£7^ ’ 


y' = 


Vl + 2£7^ ’ 
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so that Var(X' + Y') = 1, and denote by F', G' their distribution functions. Since the 
Kolmogorov distance does not change after rescaling of the coordinates, we still have 

L{F'*G',^) < ||F'*G'-^>|| = \\F„*G„-^„^\\ <y/e<l. 

In this situation, we may apply Proposition 8.1 to the couple {F',G'). It gives that 

/ 4\2/ 

L(F',cI>,,) < C (loglog-j (log-) 

with some absolute constant G, where v[ = Since v'l < ui < \/3u'^, 

we have a similar conclusion about the original distribution functions, i.e. F(F, < 
C (log log (log Now we use Proposition A.2.3 (applied when one of the distribu¬ 

tions is normal), which for cr < 1 gives A(Xo-) < L{F, ‘I*,;i), and similarly for Y. Hence, 


(log log 

A{X„) < G ^ ^ , 

t/log 7 


(log log 

A(Y^} < G ^ ^ AiL . 

CJ^-v/log i 


( 11 . 1 ) 


We are now in a position to apply Proposition 9.2 to the random variables X'^ = X„/y/\ + 
Y^ = Y„l\/l + which ensures that with some absolute constant c > 0 

c{vi{afD{X^)+V2{afD{Y^)) < D{X^ + Y„) 

1 \ f ^ 1 


+ A(A,)log3/2 (2 + 


Ul(cr)A(Ao 


+ A(y,) log3/2 (2 + 


V2{cr)A{Y„ 


where ui(cj)^ = Var(A^) = and ^ 2 ( 0 -)^ = Var(y() = (ui(cr), U 2 (<t) > 0). Note that 


'^ 1 ( 0 ') > crjV^- Applying the bounds in (11.1), we obtain that 


civiiafDiX,) + V2{afDiY,)) < D{X„ + Y^) + 


(log log 


4\2 


CT2y/i^ 


iJ=- log3/2 2 + 




(log log 


4\2 


with some other absolute constant c > 0. Here, D(X^ + Yu) < 2e, which is dominated by the 
last expression, and we arrive at; 


Proposition 11.1. Let X,Y be independent random variables with total varianee one. 
Given 0 < a < 1, if the regularized random variables Xu, Yu satisfy D{Xu + Yu) < 2e < 2, 
then with some absolute constant G 


Var(A,) D{Xu) + Var(y,) D{Yu) < C log3/2 ( 2 + 


CT2y/i^ 


(log log 


4',2 

£ >' 


( 11 . 2 ) 


It remains to solve this inequality with respect to e. Denote by D' the left-hand side of (11.2) 
and let D = a^D'. Assuming that D{Xu + Yu) < 2 and arguing as in the proof of Proposition 

10.2, we get log3/2(2 + G/D'), hence ^ = ft log^(2 + V^) with 

some absolute constant C'. The latter inequality implies with some absolute constants 

1 G"' 

log - < C'"Alog"^(2 + A)<^ log^(2 -h 1/F), 

and we arrive at the inequality of Theorem 1.1 (which holds automatically, if D{Xu + Yu) >1). 
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12. Appendix I: General bonnds for distances between distribntion functions 

Here we collect a few elementary and basically known relations for classical metrics, intro¬ 
duced at the beginning of Section 2. Let F and G be arbitrary distribution functions of some 
random variables X and Y. First of all, the Levy, Kolmogorov, and the total variation distances 
are connected by the chain of the inequalities 0 < L{F, G) < ||F — G|| < ^ ||F — G||tv < 1- As 
for the Kantorovich-Rubinshtein distance, there is the following well-known bound. 

Proposition A.1.1. We have L{F,G) < Wi{F,G)^^^. 

Proposition A.1.2. If ff^x^dF(x) < and dG{x) < (B > 0), then 

a) Wi{F,G)<2L{F,G)+4BL{F,G)B'^ and b) Wi{F,G) < AB \\F - G\\^/^. 


Proof. It follows from the definition of the Levy distance h = L{F,G) that, for all x G R, 

\F{x) - G(x)| < (F(x + h)- F{x)) + {G{x + h) - G{x)) + h. 

Integrating this inequality over a finite interval (a, 6), a < b, and using a general relation 
J^(B(x + y) - F(x)) dx = y {y> 0), we get 

f \F{x) — G{x)\dx < h{2 + {b — a)). 

J a 

By Chebyshev’s inequality, P{A > x} < -^ and P{X < —x} < ^ (x > 0), and similarly for 
Y. Hence, |F(x) — G(x)| < and for any 6 > 0, 


l{\x\>b} 


\F{x) — G{x)\dx < / 

j{\x\>b} 


dx 


2R2 

~ir 


Using the previous estimate over the finite interval with a = — 



\F{x) — G(x)| dx < 2h{l + b) + 


b, we arrive at 
2R2 

~ir' 


This bound can be optimized over all 6 > 0 by taking b = B/y/h, and then we get the estimate 
in a). In case of the Kolmogorov distance, one can use similar arguments. Indeed, 



|F(x) — G(x)| dx < 2hb 


with h 


F-G\\. 


Hence, |F(x) — G(x)| dx < 2hb -|- The optimal choice b = B/y/h leads to the second 
bound of the proposition. □ 


13. Appendix II: Relations for distances between regularized distributions 


Now, let us turn to the regularized random variables = X + aZ, Y„ = Y + aZ, where 
O' > 0 is a fixed parameter and Z ~ A(0,1) is a standard normal random variable independent 
of X and Y. They have distribution functions 


Fa{x) = 

roo 

/ F{x-y)d^^{y) = 

J —oo 

poo 

/ - y)dF{y) 

f —OO 

G„{x) = 

roo 

/ G{x - y)d4>^{y) = 

J —OO 

poo 

/ 4>^{x - y)dG{y) 

f —OO 
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and densities 


/ OO 2 ^CXD 

ip„{x - y) dF{y) = - ^ F{x -y)y (p„{y) dy, 

-OO ^ j — OO 

/ OO 2 /*CXD 

^a{x - y)dG{y) = - 2 / G{x - y)y ip„{y) dy. 

-OO ^ J— OO 


So, in terms of the Kolmogorov distance, 


\p<t{x) - qa{x)\ < 


\y\^a{y)dy = 


2 IIF-GII 

VFx cr 


Similarly, 


r \p.{x)-q„{x)\dx <^r \F{x)-G{x)\dx r \y\ipAy)dy = 2 

J— OO ^ J—OO J—OO ^ 

Simplifying with the help of < 1, let us state these bounds once more. 

V Itz 

Proposition A.2.1. We have 

a) supj, |p„(x) - g^(x)| < i ||F - G||. h) \\F„ - G„\\^y < ^Wi{F,G). 

Thus, if F is close to G in a weak sense, then the regularized distributions will be closed in a 
much stronger sense, at least when a is not very small. Now, applying the general Proposition 

A. 1.2, one may replace Wi{F,G) in part b) with other metrics: 

Proposition A.2.2. If Jf^x^ dF(x) < and dG{x) < (B > 0), then 

a) ||F.-G,||tv < I [L{F,G)+2BL{F,G)B^]; 

b) ||F.-G,||tv < ^\\F-G\\B\ 

Combining Propositions A.1.2 and A.2.1, one may bound sup^, \pa{x) — qa{x)\ in terms of 
the Levy distance L{F,G), as well. However, in order to get rid of the unnecessary parameter 

B, one may argue as follows. Recall that 

1 

Pa{x)-qa{x) = ^ / {G{x-y) - F{x-y))yip^{y)dy. 

J-00 

From the definition of /i = L{F, G), it follows that |G(u) — F{u)\ < {G{u + h) — G{u — h)) + h, 
for all u G R, which gives 

/ CXD 

{G{x -y + h)- G{x -y-h)) + h) \y\ip„{y) dy 

-OO 

/ OO poo 

{G{x -y + h) - G{x -y-h))dy + h / \y\ip„{y) dy 

-OO J —OO 

2h , 2(7 

= ^= + h^=. 

\fFne 

Here we used the property that the function \y \ +a{y) is maximized at y = +a. Simplifying 
absolute factors, the right-hand side can be bounded by | + ah. We thus obtained: 
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Proposition A.2.3. We have 

sup \Pa{x) - qa{x)\ < (^1 + . 


14. Appendix III: Special bonnds for entropic distance to the normal 

Let A be a random variable with mean zero and variance Var(A) = {v > 0) and with a 
bounded density p. In this section we derive bounds for the entropic distance D{X) in terms of 
the quadratic tail function Sx{T) = x"^ p{x) dx and another quantity, which is directly 

responsible for the closeness to the normal law, A(A) = esssup^, {p{x) — (pv{x)). As before, 
stands for the density of a normally distributed random variable Z ~ A(0,u^), and we write 
ip in the standard case v = 1. The functional A = A (A) is homogeneous with respect to A 
with power of homogeneity —1 in the sense that in general A(AA) = A(A)/A (A > 0). Hence, 
the functional A = y^Var(A) A(A) is invariant under rescaling of the coordinates. 

To relate the two quantites, D{X) and A = A(A), first write p(x) < py{x) + A < + A, 

which gives p{x) ■ < 1 + A Hence, 


p{x) log 


Pjx) 

Pv{x\ 


■ dx 



p{x) log [p{x) + 


x‘^ \ 


dx < log ^1 + Avy/^ 



Thus we have; 


Proposition A.3.1. Let X be a random variable with mean zero and variance Var(A) = 
(u > 0), having a bounded density. Then 

D{X) < log (l + vA{X) + i. 


This estimate might be good, when both D{X) and A(A) are large, but it cannot be used 
to see that A is almost normal. So, we need to considerably refine Proposition A.3.1 for the 
case, where A(A) is small. For definiteness, consider the standard case v = 1. Take any T > 0. 
Using once more the bound p(x)\/^ < 1 + A\/^, where A = A(A), we may write 


f P{x) log dx = 
'{M>T} T\x) 


l{\x\>T} 


p{x) f log [p{x)V^) +^') dx 


< log [l + AV^)+ ^SxiT) < AV^+^Sx{T). 

On the last step we used log(l +1) <t to simplify the bound. 

For |x| < T, we use < 1 + so that log < log(l + ^) < ^. This gives 




p{x 


A 

p{x) 


< V9(x)log(^l + 


p{x 
[-^ 
p{x) log 


+ . 


A log ("l + A-^f^ < A + A log (l + A\/^e^^/^), 


and after integration over [—T, T] 

dx < 2Ar + 2Ariog(l + Av/^e^"/2). 


/{|x|<T} T{x 

Collecting the two bounds, we arrive at: 
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Proposition A.3.2. Let X be a random variable with mean zero and variance Var(A) = 1, 
having a bounded density. For all T > 0, 


D{X) < A(A) [\/^ + 2T + 2T log(l ++ ]^5x{T). 


Hence, if A (A) is small and T is large, but not much, the right-hand side can be made 
small. When A (A) < one may take T = iy2log(l/A(A)) which leads to the estimate 

D{X) < CA(A)Vlog(l/A(A)) + i5x(T), 

where C is absolute constant. If A satisfies the tail condition P{|A| > t} < {t > 0), 

we have Sx{T) < cA (1 -|- T^) and then D{X) < C A(A) log where C depends 

on the parameter A, only. 
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